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Abstract 

ZE3RA is the software package responsible for processing the raw data from the 
ZEPLIN-III dark matter experiment and its reduction into a set of parameters 
used in all subsequent analyses. The detector is a liquid xenon time projection 
chamber with scintillation and electroluminescence signals read out by an array 
of 31 photomultipliers. The dual range 62-channel data stream is optimised 
for the detection of scintillation pulses down to a single photoelectron and of 
ionisation signals as small as those produced by single electrons. We discuss 
in particular several strategies related to data filtering, pulse finding and pulse 
clustering which are tuned to recover the best electron/nuclear recoil discrimi- 
nation near the detection threshold, where most dark matter elastic scattering 
signatures are expected. The software was designed assuming only minimal 
knowledge of the physics underlying the detection principle, allowing an unbi- 
ased analysis of the experimental results and easy extension to other detectors 
with similar requirements. 

Key words: ZEPLIN-III, liquid xenon detectors, dark matter, signal analyses, 
data reduction 



'Corresponding author: 
Email address: nevesScoimbra.lip.pt (F. Neves) 



1. Introduction 

The present work evolved within the ZEPLIN-III experiment, a two-phase 
(liquid/gas) xenon detector aiming to measure very low energy nuclear recoils 
produced by the interaction of dark matter WIMPs (Weakly Interactive Massive 
Particles) [1 S, S, i] . Searches for rare particle interactions in low-background 
physics experiments are inherently difficult, and any data reduction and analysis 
software must address two main challenges. Firstly, unusual event topologies will 
almost inevitably appear in the long exposures required for the science acquisi- 
tions (lasting typically for many months or even years). These are not exercised 
by the various calibration runs and may result from localized instabilities or un- 
expected backgrounds, for example. Secondly, WIMP-nucleus interactions are 
expected to result in very small energy transfers to xenon atoms, and so the 
search for scintillation and ionization signatures from WIMPs goes down to the 
quantum of response in these channels, i.e. to the photoelectron level in scintil- 
lation and to single electrons in ionization. At this level, signals become sparser 
in time and therefore less recognizable; inevitably, statistically-motivated pulse- 
finding techniques are required in order to separate real pulses from noise or to 
distinguish them from unrelated pulses. When signals are not only rare but also 
extremely small, the potential for their mis-parametrization (and consequent 
detection inefficiency) is high. 

The ZEPLIN-III data Reduction and Analysis (ZE3RA) package must there- 
fore be very accurate at separating small signals from the noise, thus maximizing 
sensitivity to WIMPs. It must deal with unexpected rare artefacts arising in 
very long measurements; these include internal effects (e.g. occasional micro- 
discharges), changes in the local underground environment (pressure, temper- 
ature, structural movements), long-term electronic drifts and occasional elec- 
tromagnetic pick-up, and many others. It must also be very flexible since our 
knowledge of these effects improves as more data arc accumulated. For these 
reasons we opted for deferring the physics interpretation of the detector response 
to a later stage in the data analysis. ZE3RA provides robust reduction of the 
raw data acquisition output in the form of full pulse parametrization. Other 
requirements include a powerful event display and the possibility to deal with 
blind analyses, whereby specific events or datasets present in the data repository 
should not be displayed or reduced until the final analysis stage so as not to 
bias the signal estimation. 

In the context of direct WIMP searches, the ultimate benchmark of such a 
package is the level of electron/nuclear recoil discrimination. This is quan- 
tified by the relative number of events belonging to the heavily populated 
electron-recoil background population (mainly gamma-rays, with high ioniza- 
tion/scintillation ratios) that leak into the acceptance region of the signal (mim- 
icking nuclear recoils with low such ratios). ZEPLIN-III achieved the best dis- 
crimination of any two-phase xenon detector reported so far and, in spite of 
a relatively high gamma-ray background, achieved a sensitivity for WIMP sig- 
nals amongst the best in the world 0, S 01 ■ This relied in part on the ability 
of ZE3RA to parametrize events correctly. Some of the algorithms contained 
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within it can be applied to other detector arrays, not only in the context of 
underground experiments. 

2. Setup and Data Processing 

ZEPLIN-III is a two-phase (liquid/gas) xenon time projection chamber con- 
taining m 12 kg of liquid xenon above a compact hexagonal array of 31 2-inch 
photomultipliers (ETL D766QA) 0, H, Q. The photomultipliers (PMTs) are 
immersed directly in the liquid at a temperature of — 105°C and record both 
the rapid scintillation signal (SI) and a delayed second signal (S2) produced by 
proportional electroluminescence in the gas phase from charge drifted out of the 
liquid 0- The electric field in the active xenon volume is defined by a cathode 
wire grid 36 mm below the liquid surface and an anode plate 4 mm above the 
surface in the gas phase. These two electrodes define a drift field in the liquid 
of rj 4 kV/cm and an electroluminescence field in the gas of w 8 kV/cm. A 
second wire grid is located 5 mm below the cathode grid just above the PMT 
array. This grid defines a reverse field region which suppresses the collection of 
ionization charge for events just above the array and helps to isolate the PMT 
input optics from the strong external electric field. 

The PMT signals are digitized at 2 ns sampling over a time segment of 36 fis 
starting at —20 /xs from the trigger point. Each PMT signal is fed into two 8-bit 
digitizers (ACQIRIS DC265) with a x 10 gain difference between them provided 
by fast amplifiers (Phillips Scientific 770), to obtain both high (HS) and low (LS) 
sensitivity readout covering a wide dynamic range. The PMT array is operated 
from a common HV supply with attenuators (Phillips Scientific 804) used to 
normalize their individual gains. The trigger is generated using the shaped sum 
of the HS signals from all the PMTs. For the sake of illustration, Fig. [1] shows 
the sums of all HS and LS channels for a low energy multiple scattering event 
triggered by the first S2 signal. 

2.1. Software architecture 

The efficient analysis of the huge amount of data produced by the acquisition 
system (DAQ) demands its previous reduction to a set of relevant physical pa- 
rameters (pulse timing estimators, height, area, etc). In the ZEPLIN-III exper- 
iment this task is performed by the ZE3RA software. ZE3RA is implemented in 
CH — h following a comprehensive class-oriented architecture mimicking the vari- 
ous DAQ functional stages (run, event, channel, etc). The software design allows 
easy plug-in of new tools and to build different reduction templates targeting 
specific analyses. These features are of key importance for the optimization 
of the analysis which should produce consistent results over a large variety of 
acquisition scenarios lasting from a few hours to several months (different field 
configurations, calibration sources, etc). The ZE3RA architecture, illustrated 
schematically in Fig. [2 includes the skeleton classes: 

• The ZDisplay class layers the end user interaction with the classes manag- 
ing the analysis and holding both the raw and the reduced data structures. 
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Figure 1: Sum of all HS (left) and LS (right) channels for a low energy event having 3 energy 
depositions in the liquid xenon sensitive volume. The distance between the fast SI signals (all 
overlapping in time) and the corresponding S2 is a measure of the depth of the interaction. 
For each S2 pulse the light distribution over the PMT array allows to reconstruct the position 
of the interaction in the (x,y)-plane. 

• The ZRun class inherits from ZEvent and manages all the settings and 
the different reduction templates. 

• The ZEvent class stores and manages the access to individual events and 
folded data structures (i.e. ZChannel). It inherits from ZRawFileHan- 
dler and ZNtupleFileHandler which layer, respectively, the input from 
the raw DAQ data files and the output to the databases containning the 
reduced quantities. The ZRawFileHandler class inherits from ZBlindMan- 
ager which implements the access policy to the raw data used in the WIMP 
search . 

• The ZChannel class stores and manages the data from individual channels 
and all contained structures (i.g. ZPulse). 

• The ZPulse and ZCluster classes store all information related with indi- 
vidual pulses and clusters of sequential pulses, respectively. It inherits 
from ZStatistic and ZPhysics which gather and maintain, respectively, 
statistical and physical data. 

In the following subsections we present a short overview of the most relevant 
algorithms implemented and tested in the ZE3RA analysis framework for the 
ZEPLIN-III experiment. It should be noted that not all of those algorithms 
were used for the production of published results, either for the first or second 
science runs (in 2008 and 2010/11, respectively). In particular, the moving 
average algorithm ( i)2.3.2p was preferred to the more refined wavelet analysis 
f £|2.3.4j) as a trade-off between speed and performance. Rather than lessening 
the purpose of the software package, this puts the emphasis on its design and 
architecture as a general framework easily extensible to different acquisition 
scenarios and similar experiments with specific needs. 
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Pulses 



class ZRun 

> configuration settings; 

> read (raw DAQ file); 

> event manager: 

• event 1; ^^-^^ 
event 2; 

> reduction templates; 

> write (output database); 



T 

Display 



class ZDisplay 



class ZEvent 



> channel manager: 
channel 1 (HS); 
channel 1 (LS); 
channel 2 (HS); 
channel 2 (LS); ■ 

sum (HS); 
sum (LS). 



class ZChannel 

> baseline (mean.rms); 

> pulse manager: 

• pulse 1; — — 

• pulse 2; 

> pulse search; 

> pulse resolve; 

> pulse cluster; 

> pulse match; 

> ...; 



class ZStatlstic 

gather and maintain 
statistical data. 

class ZPhysics 



gather and maintain 
physical data. 




class ZPulse 
class ZCIuster 



> time: start, end, x, 

> heigh, area, ...; 

> saturation,...; 

> matchs, ...; 

> ...; 



Figure 2: Schematic representation of the ZE3RA software architecture. 



2.2. Baseline characterization. 

The baseline is parametrized using the waveforms containing the actual PMT 
signals. To avoid any bias due to the occurrence of transients or small spurious 
signals, the parametrization method relies on a consistency check of the noise 
distribution variance during a sufficiently large time window. For that purpose, 
the DAQ pre-trigger region is divided into i = I..Mq consecutive regions contain- 
ing m samples each. For each of these regions, the variances {erf, i = l..Mo} of 
the signal amplitude distribution are calculated. The F- distribution probability 
function (Q) is used to check if the variances arc statistically consistent: 
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Q is therefore the significance level at which that hypothesis (of = <jf +1 , i = 
l..M ) can be rejected For each of the Mq regions, the means i = l..M } 
of the signal amplitude distribution are also calculated. The noise (abas) and 
mean (nbas) characterizing each waveform are then defined as 
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for those M regions satisfying Q < Qcrit- For the ZEPLIN-III analysis the 
values of Q cr it = 0.0001 and m = 25 (50 ns) were used. The maximum length 
of the total sampled waveform was 2 /is (Mo = 40). For each event the abas, 
[ibas and M values are stored for all channels and can be used, for example, to 
identify misbehaving baselines. 

2.3. Raw data filtering. 

In order to enhance the signal-to-noise ratio and help with the identifica- 
tion of relevant pulse structures, a set of general filter algorithms are available 
in ZE3RA. Besides the built-in filters, which arc briefly described below, the 
software framework allows easy plug-in of new algorithms and their use at any 
defined configuration. It should be noted that the DAQ raw data is never modi- 
fied during the analysis; instead, an auxiliary buffer containing the filtered data 
is maintained for every channel. 

2.3.1. Moving average. 

The implemented moving average algorithm is a simple low-pass filter defined 

as 

y < = 2m + l ' (4) 
where y and y represent, respectively, the filtered and raw data buffers. Al- 
though appealing because of its speed, the moving average produces a signifi- 
cant loss of information when pulses are narrow compared with the filter width 
(2m +1). The loss of information or even distortion of the filtered signal (y) is 
an effect of the weight given to the fc-th data point in Eq. |4] being independent 
of its distance to i. In ZEPLIN-III this presents a problem since the SI and S2 
signals have very different time constants, respectively, ~ 30 ns and ~ 0.6 /is 
(parametrized by the signal mean arrival time). If m is tuned for S2 then SI 
information is lost. Conversely, setting m to preserve SI does not help with the 
detection of S2 pulses. These issues related with the time scale at which one is 
filtering or looking for a signal are addressed by both the methods described in 
glXHand g2XS 

2.3.2. Moving average with variable width. 

A solution to the problem highlighted above, which retains most of the sim- 
plicity and speed of the moving average algorithm, is to adapt the width of the 
filter (Eq. 0]) based on some characterization Q y (i,mi) of the local data being 
analysed: 

Vi = — o — j— . ( 5 ) 

2m, + 1 

with 
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f m t+ i = m, + 1, Q y (i, mi) < Q thr A ™i < m max 
\ rnj+i = mi - 1, Q y {i, m l ) > Q thr f\ rrii > m min ' 

where m m i n , m max and Qthr & r e user-defined parameters controlling the limits 
and the adaptation of the filter width. In the ZEPLIN-III analysis Q y (i,mi) 
was set to the variance of y calculated in the [i — ra^, i + mj interval. Typical 

1/2 

values of m min ~ 35 (70 ns), m max ~ 100 (0.2 fis) and Q t ^ r = 3a bas (Eq. [5j 
show good performance in terms of both SI and S2 pulse detection. Figures [3] 
and [4] show examples of applying this filter to typical ZEPLIN-III events and it 
comparison with the results obtained using the algorithm described in §2.3.41 

2.3.3. Fourier analysis. 

Besides the random noise intrinsic to any signal processing chain, in ZEPLIN- 
III one can also observe the occurrence of coherent noise. This noise is most 
often induced by electric equipment working in close vicinity to the detector or 
following the saturation of the amplifiers. In both cases the noise exhibits time 
periodicity and is composed of a small set of characteristic frequencies. These 
properties make the Fast Fourier Transform (FFT) analysis particularly suitable 
for the identification and removal of such occurrences. 

The FFT is an efficient computational tool to calculate the Fourier transform 
of a function (y) sampled at a finite number of N points {y n , n = 0..N — 1} Q. 
In the frequency domain the amplitude of the component /„ = n/NA is given 
by 

JV-l JV-l 

H(f n ) = AJ2 y k ^ kn/N = &J2 

k=0 k=0 

where A represents the sampling time interval. In ZE3RA the coefficients H(f n ) 
are calculated using the FFTW library Q . 

This FFT tool was used only on a dedicated dataset acquired to study single 
electron emission from the liquid into the gas phase [Toj . For the acquisition 
of that dataset the DAQ was triggered externally with a pulse generator which 
accidentally induced coherent noise into the HS channels. After calculating 
the FFT coefficients for each timeline, a 10-pole Butterworth filter was used to 
attenuate H(f n ) (Eq. |6]) corresponding to the set of noise frequencies {fnoise} 
contained in each channel. This procedure allowed successful recovery of the sig- 
nal as shown by the comparison of the single electron emission results obtained 
using this particular dataset with the results from another method described in 
Ref. llfj using WIMP search data sets. 

2.3.4- Wavelet analysis. 

Resulting from being encoded both in terms of amplitude and phase of sines 
or cosines (Eq. [6]) , the FFT analysis gives no direct information about the time 
occurrence of transients Q . One solution to this problem would be to divide the 
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entire domain into small regions and analyse them separately. Nevertheless, this 
method would imply the loss of information at lower frequencies when compared 
with the duration of the region being analysed. Another solution allowing to 
preserve both time and frequency information is to use the Discrete Wavelet 
Transform (DWT) analysis @|. 

The DWT scales (s) and shifts (t) a mother function JF along the time 
domain while recording its level of correlation with the signal into a set of 
coefficients w(s, r): 



+00 



w(s,t) = Vi*F( s ,T),i, (7) 

i— — 00 

where y represents the raw data stream and F< s ,t) are the set of basis functions 
obtained from scaling and translating J- at fixed (s, r) steps. Unlike the sinu- 
soidal functions which define a unique FFT of the signal, there are many possi- 
bilities for T producing different DWT coefficients. The choice of T is based on 
the trade-off between localization/smoothing (time/frequency domain) required 
for a particular application. The ZE3RA framework simply wraps the DWT 
code available from GSL and expands its functionality to facilitate the ma- 
nipulation of the coefficients w(s,t) (Eq. [7|). The available J 7 family functions 
arc: Haar, Daubechies and biorthogonal b-spline (llj . 

Currently ZE3RA provides two different noise removal and smoothing algo- 
rithms based on DWT decomposition. The first of these algorithms implements 



the soft threshold technique described in Ref. ll_2j for a uniformly distributed 
Gaussian noise: for a given scale s the N$ wavelet coefficients are translated 
towards by an amount 



5 S = ^™±MAD, (8) 
0.6745 ' v ' 

where MAD represents the median absolute deviation of {wj(s,r)j = l..N s }. 
The second algorithm combines data smoothing with the edge detection and 
preservation method described in [l3| . The edge detection is based on the mul- 
tiscale behavior of the local maxima of the wavelet coefficients moduli (Eq. [7]). 
The value of \w(s, r)| measures the derivative of the smoothed signal at the scale 
s and a signal sharp variation (e.g. Sl-like pulses) produces moduli maxima at 
different scales [HI . After calculating the moduli of the wavelet coefficients and 
finding local maxima, the implemented algorithm maps and stores the inter- 
scale evolution of the |u>(s,t)| values. This information can be used to study 
the Lipschitz regularity of the signal and further selection of the edge/pulse 
types to retain [l3j. The signal coefficients w(s,t) which do not belong to a 
valid edge structure can either be smoothed using the soft threshold method de- 
scribed above (Eq. [5]) or simply reset to at selected higher order scales (higher 
frequencies). An additional benefit from using the second algorithm to smooth 
the data is the intrinsic availability of the time position of the edges even in the 
occurrence of a baseline drift. This information can later be matched against 
the algorithm described in §2.51 to enhance the pulse finding efficiency. 



Figures [3] and U] show the comparison between the results of the wavelet 
analysis with edge detection and of the moving average with adaptive width 
f< j2.3.2j) when applied to two typical events. The DWT decomposition was 
done into 14 scales using a bi-orthogonal b-spline mother function (J 7 ) of order 
(1,3) [Hj]. F° r the sake of illustration, both figures show the result of clearing 
all coefficients above the 8th scale (w(s > 8, r) = 0) together with the effect 
of keeping those up to s = 10 when belonging to an identified Sl-likc edge. 
Besides improving the sensitivity to the general shape and start time of the 
fast SI signals (Fig. smoothing the data using the edge detection algorithm 
also improves the discrimination between S2 and SI when they are very close 
(Fig. [3]) . The latter class of events corresponds to energy depositions near the 
surface of the liquid. 




(MS) 



Figure 3: Comparison of the smoothing results obtained using the moving average with vari- 
able width (green line) and a DWT decomposition keeping (red line) or ignoring (blue line) 
the edge information. This event corresponds to an interaction just below the liquid xenon 
surface. In this instance the small SI pulse, shown also in the inset, proceeds a much larger 
S2 by a very short time. 

2-4- Channel delays. 

Due to differences in the signal processing chain (PMTs, cabling, ampli- 
fiers, etc), all ZEPLIN-III channels exhibit relative delays ranging typically 
from ±10 ns. The correct alignment in time of all channels is crucial for the 
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Figure 4: Comparison of the smoothing results for a typical SI pulse obtained using the 
moving average with variable width (green line) and a DWT decomposition keeping (red line) 
or ignoring (blue line) the edge information. 



performance of both the pulse finding and matching algorithms ( §2.51 and £12.81 
respectively). This is specially relevant for signals corresponding to low energy 
deposits, which constitute the region of interest in ZEPLIN-III. 

The individual channels are realigned in ZE3RA by defining the beginning of 
the raw data buffer j/W to point at the fc-th element of the buffer containing 
the original data read from the DAQ, 

fcjj? = tfW - min , j = 1.. J J , (9) 

where 5^ is the individual delay for channel i and J represents the total number 
of channels. The size S y for all y buffers is calculated using 

S y = S Y + min {<5 W , - max {<5 W , j = , (10) 

where Sy is the size of the original Y buffers. The very simple operations defined 
in Eq. |H] and Eq. [TU] avoid any extra time- or memory-consuming manipulations 
aside from reading the DAQ files into buffers {Y^>\ j = 1..J}. 
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2.5. Pulse finding. 

The pulse finding procedure implemented in ZE3RA consists of searching 
for excursions of the signal amplitude above a defined threshold Vthr- However, 
in the ZEPLIN-III detector both SI and S2 signals can contain several of these 
excursions depending on the distance from a particular PMT to the interaction 
point and on the energy deposited. In particular, an S2 signal can consist 
of a few tens of photoclectrons spread over a period of time of ~ 1 fis. Taking 
advantage of the underlying structure revealed from filtering ( £)2.3I) . this problem 
was partially solved in ZE3RA by first searching for pulses on the smooth data 
buffer One must keep in mind though that the effective enhancement of 
smoothing depends on the applied filter and how sparsely the individual data 
excursions occur ( §2.3[) . Regardless of any loss of information in y, ZE3RA 
keeps the sensitivity to the smallest structures by also searching for pulses in 
the original data buffer (y). The final set of pulses available for all subsequent 
analysis is the union of pulses collected from both y and y. For the ZEPLIN-III 
analysis Vthr was chosen to be 

Vthr = Hbas + 3o"6os j (H) 

where abas is the noise (Eq. ^ and fibas the mean (Eq. values characterizing 
each raw waveform (g2T2j) • This software threshold is nominally equivalent to 
an energy threshold of only 1.67 keV for electron recoils detected through SI. 
As an example, Figure [5] shows a larger pulse which is chosen from y with the 
remaining fastest pulses being picked from the unmodified y buffer. 

2.6. Pulse clustering. 

Deciding when to stop accruing small excursions above threshold into pulse 
clusters is not straightforward and has important consequences for the detection 
efficiency of small signals. For SI pulses, for example, a fixed-length integra- 
tion, typically implemented as a coincidence window between channels, can lead 
to unnecessary inclusion of noise and unequal integration efficiency for differ- 
ent particle species (due to the different scintillation decay time constants). 
Alternatively, one may cluster subsequent candidates into the pulse based on 
time separation, although over-clustering can lead to run-away effects in this 
instance. The fraction of the pulse area integrated in each approach can be 
calculated analytically if the scintillation responses are known for each species, 
but this calculation fails for very low photoelectron numbers, when the start 
time of the pulse is not defined by the rise time of the scintillation signal but 
rather by the delayed arrival of the first photoelectron. A detailed comparison 
between the constant integration and the pulse gap methods was carried out 
with a toy Monte Carlo accounting for the DAQ sampling rate, the width of the 
single photoelectron response and the scintillation decay times and respective 
intensities for electron and nuclear recoils in liquid xenon fl4j (however, noise 
or afterpulsing distributions were not included). This was used to calculate the 
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Figure 5: Example of a ZEPLIN-III event with a S2 pulse (green line) being selected from an 
excursion of y (red line) above Vthr (blue line). The fastest pulses on the right tail of the S2 
are chosen from y (black line). 



fraction (77) of SI signals which is lost from the integrated pulse area. The in- 
tegration of S2 pulses is not so critical as rj is expected to be always very small 
in this case. 

The simplest pulse clustering algorithm implemented in ZE3RA consists of 
merging all pulses found within a time window of constant width (t W i n ). The 
value of t W in is chosen according to the characteristics of the signal (i.e, SI or 
S2). The respective Monte Carlo results are shown in Fig. [5] as a function of 
the sampled number of photoelectrons for window sizes of 50 ns and 100 ns. 
As expected, the missing area fraction 77 is smaller for nuclear recoils (due to 
the increase of the faster xenon scintillation component); significantly, 77 is not 
constant for small signals, but rather it decreases due to the delayed detection 
of the first photoclectron, which is a purely statistical effect. 

The alternative clustering algorithm implemented uses the time distance 
between pulses to decide if they correspond to the same interaction in the liquid 
xenon. The algorithm iterates through pulses and recursively merges consecutive 
occurrences if the time elapsed between the end of the first and the begin of the 
next is smaller then a certain value t gap . It should be noted that the saturation 
tails from the amplifiers and the existence of aftcrpulsing signals originated 
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in the PMTs [15| can bias the results towards excessive clustering. To mitigate 
these, one additional constraint is imposed in ZE3RA: that the clustering should 
not extend out to more then a user defined factor of the maximum mean arrival 
time of photoelectrons for the pulses being clustered. The Monte Carlo results 
are also shown in Fig. [5] for t gap = 25, 60 ns. For pairs of values (t gap ,t W i n ) 
returning similar values at the lowest number of sampled photoelectrons, the 
performance of the gap method improves quickly with the magnitude of the 
signal and almost independently of the type of particle. 

Using real detector data, we verified that the time gap algorithm was more 
robust than the constant time window option when dealing with different ac- 
quisition scenarios (different fields, detector tilt or changes in the gas gap, etc) 
and with the interaction of different particles in the liquid xenon (from gamma 
and neutron calibrations) . We also found that the constant integration method, 
with t W i n tuned for specific conditions and single scatters in the liquid, often 
clipped multiple overlapping pulses, therefore biasing their parametrization and 
correct identification. These reasons, and the potential for better performance 
on rare pulse topologies which may arise in very long exposures, led us to adopt 
the time difference method, with a simple heuristic adjustment of t gap to 20 ns 
and 100 ns for SI- and S2-like pulses, respectively 




Figure 6: Comparison for both electron (ER) and nuclear (NR.) recoils in the liquid xenon of 
the fraction of lost area of SI signal (r/) when clustering pulses using a constant time window 
{twin = 50, 100 ns) or the time gap between pulses (t gap = 25, 60 ns). 
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2.7. Multiple pulse resolution. 

In the ZEPLIN-III experiment only events with one SI and one S2 are consid- 
ered for most analysis (e.g. WIMPs search). Events corresponding to multiple 
scatters can be promptly identified using the S2 channel if the individual energy 
deposits occur at different depths in the LXe active volume. The time separation 
between the SI and S2 signals is equal to time taken by the ionization electrons 
to travel from the interaction site to the liquid surface along the drift field. This 
mechanism is independent of the position of the interactions in the (x, y)-planc 
which, in any case, can only be obtained using position reconstruction after the 
correct identification and paramctrization of the pulses. 

The algorithm implemented in ZE3RA to identify and resolve multiple scat- 
ter events with overlapping S2 signals consists simply of reusing the pulse finding 
and clustering algorithms described in £12.51 and $2.6\ for higher thresholds. For 
each S2 a list of thresholds is obtained by scanning the smooth buffer (y) and 
accumulating the values Vi = y l + V t hr (Eq. [IT]) obeying: 

Vk-Vi> v thr A y m -y t > Vthr, k < I < m , (12) 

where k and m are constrained by the candidate pulse start and end times 
(t start, tend)- The pulse finding and clustering algorithms are then applied to 
the data sub-domain defined by (t s tart, tend) using the lowest value in {V;}. The 
algorithm recursively applies this method to every resolved S2 pulse using the 
remaining threshold values in the V list. To illustrate the procedure, Figure [7] 
shows the set of test thresholds {Vn.r^s, Vg.9^s} for a ZEPLIN-III event 

with 4 overlapping S2 pulses. 

2.8. Pulse matching. 

One frequent requirement of experiments using an array of photodetectors 
is to order in time and match signals from all DAQ channels. The simplest 
approach would be to have the ordering and matching function O depending 
both on the start and end times (t start, tend) of any two pulses {A, B} occurring 
in different channels {a, &}, 

{A precedes B , < ^fj rt 
A follows B ,t s il t >ti B J d . (13) 
A matches B , otherwise 

An obvious scenario where the above method fails is when the pulse limits are 
extended out due to the occurrence of some sort of noise, amplifier saturation 
tail, etc. The possibility of having multiple interactions with partially overlap- 
ping S2s can also drive Eq. [13] to faulty results ( $2.7)) . In order to solve these 
problems, an algorithm was implemented which takes into account a relative 
quantification W of the matching between pulses, 

w(a,b)= v\ a) *v?\ ( 14 ) 

i£{AnB} 
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Figure 7: ZEPLIN-III event with 4 overlapping S2 pulses (light blue, magenta, yellow and 
olive). The figure also shows the set of thresholds {V11.7 ^7.2^1 Vg.g^s} gathered from 
scanning y (red line) and used to resolve the original pulse found from an excursion of the 
data past V t hr- 



where and represent, respectively, the data streams containing pulses A 
and B. The algorithm starts by using Eq. [13] to order and group overlapping 
pulses from the two channels {a, b}. For each one of these groups, eq.Q2]is used 
to generate an overall matching quantity W defined as, 

W = Y[w(A k ,B m )S krn . (15) 

The values of 5km are initially set to 1 for all pairs of (k, rri) pulses. A fast best- 
survival algorithm was then implemented which maximizes the value of W while 
setting 5km to (Eq. (TS]) for all but the best degenerated matches W(Ak, B m ) 
(Eq. fH)) . The application of this method in ZE3RA introduces a flavour of most 
probable match, even if the true pulse shape is not considered, which increased 
significantly the robustness of the pulse finding analysis. 

The relation between pulses from different channels generated by the match- 
ing algorithm was used in several of the reduction stages in order to: 

• ignore LS pulses lacking a match in the respective HS channel Qj2.5p : 

• cluster groups of scattered pulses in individual channels matching only one 
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pulse in the correspondent HS or LS sum channels f £|2.7[) : 

• disentangle the correspondence between pulses in individual channels with 
overlapping pulses in the sum channels for multiple interactions at different 
(x,y)-positions ( ^2.8|) ; 

• avoid misinterpretation of amplifier saturation tails at individual channels 
as multiple pulses in the sum channels; 

• map the correspondence between pulses or groups of pulses (clusters) in 
all channels for the output databases containing the reduced quantities 

2.9. Pulse parametrization. 

Once the pulse start and end times are set resulting from the operations 
described in sections 12.21 to 12-81 a number of parameters are extracted from the 
raw data buffers. The actual list of parameters can depend on the pulse context 
information (e.g. whether the pulse has a saturation or ringing tail, etc) and is 
extensible to support any subsequent analysis. For the ZEPLIN-III analysis the 
reduced databases include: 

• pulse start time, width (between threshold crossings, at 10% and at 50% 
height), amplitude, area, signal mean arrival time, pulse symmetry, etc; 

• pulse matching mapping over different acquisition channels, etc; 

During any hypothetical step of the analysis it may be convenient to access 
some characteristic of an individual pulse prior to its formal and complete 
parametrization: consider, for example, that one requires the area of a pulse 
before its clustering ( §2.6|) . For that purpose, any parameter operator a that 
applied to pulses A and B obeys 

a{A) + a(B) = a(A U B) (16) 

is updated every time the pulse start or end times changes. This feature of the 
framework increases significantly the performance of the analysis since it avoids 
redundant loops over the data buffers for a. 

3. Interface. 

The ZE3RA human interface layers the end-user interaction with the base 
analysis framework and can be used either in graphical or batch operation mode. 

The graphical interface mode was designed to allow easy navigation through 
events and channels while providing an inline tuning of the relevant analysis 
parameters and visualization of their output in terms of pulse identification and 
parametrization. To help understanding of the detector output, the interface 
also incorporates a 2D plot of the relative response of the array, the possibility 
to plot together any set of high/low sensitivity channels in any temporal scale 
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and mouse context information on pulse properties. The interface was coded 
using the cross-platform GUI toolkit FLTK (Fast Light Toolkit). However, it is 
worth noting that the core analysis classes are self-contained (class ZRun) and 
independent of the graphical interface. This provides the ability to promptly 
reuse the developed analysis framework together with any available graphical 
tool or within the DAQ environment for online monitoring of the detector. Sig- 
nificantly, it makes the porting of this framework to other detector systems quite 
straightforward . 

The batch mode consists of a simple command line input designed mainly 
to perform the mass reduction of any set of data files with minimum user in- 
teraction (for example, the first science run data consisted of 13,234 binary files 
containing over 23 million triggered events). The analysis input parameters 
are fed as a configuration file which can easily be created from the graphical 
interface. 

An important feature affecting both interface modes is the built-in blind 
manager. This manager allows a super-user to set, based on a simple set of 
rules, which events or datasets can be visualized and reduced at a specific step 
of the WIMP search analysis. Such signal-blind analyses are an essential tool 
of rare rare event searches. 

4. Conclusions 

A robust and versatile software package was developed for the analysis and 
reduction of the raw data from the ZEPLIN-III experiment. The framework 
allows the easy plug-in of new tools and building of different reduction templates 
targeting specific analyses. The very high electron/nuclear recoil discrimination 
achieved in the WIMP search carried out in the first science run (>99.98%) 
benchmarks these algorithms. These techniques should find application in data 
reduction from detectors with a large number of channels, beyond the field of 
rare event searches. 

5. acknowledgments 

The UK groups acknowledge the support of the Science \& Technology Fa- 
cilities Council (STFC) for the ZEPLIN-III project and for maintenance and 
operation of the underground Palmer laboratory which is hosted by Cleveland 
Potash Ltd (CPL) at Boulby Mine, near Whitby on the North-East coast of 
England. The project would not be possible without the co-operation of the 
management and staff of CPL. We also acknowledge support from a Joint In- 
ternational Project award, held at ITEP and Imperial College, from the Rus- 
sian Foundation of Basic Research (08-02-91851 KO a) and the Royal Society. 
LIP-Coimbra acknowledges financial support from Fundagao para a Ciencia 
e Tccnologia (FCT) through the project-grants CERN/FP/109320/2009 and 
CERN/FP/116374/2010, as well as the postdoctoral grants SFRH/BPD/27054/2006, 
SFRH/BPD/47320/2008 and SFRH/BPD/63096/2009. This work was sup- 
ported in part by SC Rosatom, contract $\#$H.4e.45.90. 11. 1059 from 10.03.2011. 



17 



The University of Edinburgh is a charitable body, registered in Scotland, with 
the registration number SC005336. 



References 

[1] T. J. Sumner, Proc. 3rd Int. Workshop on the Identification of Dark Matter, 
ed. N. J. C. Spooner and V. Kudryavtsev, Singapore: World Scientific 
(2001) 452. 

H. M. Araujo, et al., The ZEPLIN-III dark matter detector: performance 
study using an end-to-end simulation tool, Astroparticle Physics 26 (2006) 
140. 

D. Akimov, et al., The ZEPLIN-III dark matter detector: Instrument de- 
sign, manufacture and commissioning, Astroparticle Physics 27 (1) (2007) 
46 - 60. 

V. N. Lebedenko, et al., Results from the first science run of the ZEPLIN- 
III dark matter search experiment, Phys. Rev. D (Particles and Fields) 
80 (5) (2009) 052010. 

V. N. Lebedenko, et al., Limits on the Spin-Dependent WIMP-Nuclcon 
Cross Sections from the First Science Run of the ZEPLIN-III Experiment, 
Phys. Rev. Lett. 103 (15) (2009) 151302. 

D. Akimov, et al., Limits on inelastic dark matter from ZEPLIN-III, Physics 
Letters B 692 (3) (2010) 180 - 183. 

B. A. Dolgoshcin, V. N. Lebedenko, B. U. Rodionov, new method of reg- 
istration of ionizing-particle tracks in condensed matter, JETP Lett. 11 
(1970) 351. 

W. H. Press, B. P. Flannery, S. A. Tcukolsky, W. T. Vetterling, Numerical 
recipes in C : The art of scientific computing, Cambridge University Press, 
2002. 

M. Frigo, S. G. Johnson, The design and implementation of fftw3, in: Pro- 
ceedings of the IEEE, no. 93 (2), 2005, pp. 216-231. 

ZEPLIN-III, Single electron emission in the zcplin-iii two-phase xenon de- 
tector, In Preparation. 

M. Galassi, et al., GNU Scientific Library Reference Manual (3rd Ed.), 
ISBN 0954612078. 

D. Donoho, De-noising by soft-thresholding, Information Theory, IEEE 
Transactions on 41 (3) (1995) 613 -627. 

S. Mallat, S. Zhong, Characterization of signals from multiscale edges, 
IEEE Transactions on Pattern Analysis and Machine Intelligence 14 (1992) 
710-732. 



18 



[14] J. Kwong, P. Brusov, T. Shutt, C. Dahl, A. Bolozdynya, A. Bradley, Scin- 
tillation pulse shape discrimination in a two-phase xenon time projection 
chamber, Nucl. Inst, and Meth. A 612 (2) (2010) 328 - 333. 

[15] P. B. Coatcs, The origins of afterpulses in photomultiplicrs, Journal of 
Physics D: Applied Physics 6 (10) (1973) 1159. 



19 



